-
Notifications
You must be signed in to change notification settings - Fork 347
Add support for encoding_rs behind a new feature flag #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.project | ||
.settings | ||
*~ | ||
*.bk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you should add all of that to a global ignore file anyway or you are going to need to add it in all projects all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I removed that commit from the pull request (and also rebased the pull request).
a3e676f
to
f8be594
Compare
f8be594
to
7330a72
Compare
r? @SimonSapin |
Thank you for submitting this Henri. I haven’t been in a hurry to merge it because as far as I know nothing is blocked on it. Is it? On the other hand there is a small concern that Cargo features are supposed to be purely additive. If two crates A and B both depend on a third crate C and enable different sets of C’s optional features, Cargo build C with the union of the features. What should happen when both I’d also like to remove support for rust-encoding in rust-url eventually, but that would be a breaking change. And since so much of the Rust ecosystem has come to (indirectly) depend on this crate, the pain of upgrading the world should take a stronger justification than that. |
For additivity to work I would suggest exposing a |
@Manishearth This is not just a module exposing new items, it affect the behavior of methods that exist otherwise. |
Perhaps we can use a trait here? So that the methods all act on |
Yes, another approach could be to not code for either encoding library in rust-url itself and instead have traits that users can implement. But this also would likely be a breaking change. |
Nothing is blocking on this at the moment.
Having a dependency graph that includes both encoding_rs and rust-encoding (the real one, not the encoding_rs_compat fork) is likely an unwanted situation: You end up with a bloated binary if you ship the lookup tables for both. For that reason, I thought it was a feature rather than a bug that enabling both Now that I think about this more: encoding_rs_compat is all about allowing apps to migrate to shipping only encoding_rs's lookup tables without forcing them to migrate away from all uses of the rust-encoding API at the same time. From this point of view, I can see why it could be desirable for rust-url to expose both It would be a bit ugly, since Rust doesn't have method overloading, but doable, for rust-url to expose differently-named API surface when taking (The main problem with citing encoding_rs_compat as a solution for anything is that it can only be used in non-crates.io top-level Cargo artifacts, because to use it, you have to abuse a Cargo feature that's meant for development only and that's banned in crates uploaded to crates.io. This might prevent rust-url from having code that maps |
There is something I still don't understand, why couldn't |
UTF-16 support isn't the only outward difference between encoding_rs and rust-encoding. For the streaming API, encoding_rs takes a caller-allocated output buffer whereas rust-encoding takes a caller-supplied object that provides a method for rust-encoding to call. The encoding_rs approach is more efficient and more FFI-friendly (which is, of course, part of the Gecko focus of encoding_rs). For the non-streaming Rust API, rust-encoding always copies, but encoding_rs borrows when possible (valid UTF-8 to UTF-8 decode, ASCII-only in ASCII-compatible encoding to UTF-8 decode, UTF-8 to UTF-8 encode and ASCII-only to ASCII-compatible encoding encode). Implementation-wise, it wouldn't be a matter of patching in UTF-16. Getting encoding_rs-like decode performance involves different UTF-8-targeting internals, too. (And different outward API as described above, but the encoding_rs internals are faster for decode even when using the rust-encoding outward API via encoding_rs_compat.) As minor points: The rust-encoding streaming API can't signal ISO-2022-JP unmappables in a spec-compliant way (the spec changed after the rust-encoding API was designed). Additionally, after rust-encoding had been released, the spec changed the case of the encoding names to include upper-case letters in IANA and browser tradition, which from the Rust perspective would be a breaking change for rust-encoding. |
Well rust-encoding isn't 1.0.0 so breaking changes aren't a problem anyway, and it means the crate is not finalised and we can theoretically pursue any breaking improvement and whatnot. Did we discuss about such changes with the rust-encoding maintainers before forking their crate? If not, why? |
For FFI (and also efficiency concerns), couldn't you pass a caller-supplied object that fills a caller-allocated output buffer, and let rustc optimise away that intermediate object?
I don't see why this wouldn't be a welcomed change in rust-encoding.
Interesting, did you document the differences precisely somewhere and I missed that? |
Theoretically. In practice, having rust-encoding 1.0.0 make the sort of API changes that I think are appropriate for the goals of encoding_rs would be as big a change for dependent crates to adapt to as switching from rust-encoding 0.2.x to encoding_rs.
There was some discussion, but not about the encoding_rs_compat fork per se. I thought having different internals and a different API would be appropriate for the Gecko use case. Initially I had no code (obviously). Showing up to a project with no code and saying I want to redesign both the internals and the API would have been rather presumptuous, and it would be quite appropriate for the author of a library not to agree to replacing the internals and externals of the library because a random person on the Internet shows up and says so. The way to empirically show that my design had merit was to implement it. Since the implementation shares no code with rust-encoding, it doesn't seem bad for it to be in a fresh repo. It is somewhat impolite to show up with competing crate instead of improving the existing one, but before being able to show that the change is an improvement, I though it would have been even more impolite to suggest a re-do of both the internals and externals of the existing crate (and to do some from the special Gecko point of view).
How would that be an improvement over the caller just passing
Main API differences:
Main current internal differences:
Upcoming differences:
There's a performance comparison reflecting the status as of the end of November, which doesn't include the gains from (Additionally, there seems to be a difference in attitude towards the Encoding Standard between rust-encoding and encoding_rs. rust-encoding gives encodings potentially non-Encoding Standard names and qualifies the Encoding Standard names as "WHATWG names", suggesting that rust-encoding doesn't recognize the Encoding Standard as the only or even the primary source of naming. rust-encoding also supports non-Encoding Standard encodings. By contrast, encoding_rs is strictly an implementation of the Encoding Standard. No other encodings. No other naming schemes.) |
This is what <1.0.0 versions are for, big breaking changes before stabilisation. There is much precedent about that in the Rust ecosystem, the biggest being libc. I'm also not sure about your claim given you managed to do
Serde is an example of a project that uses RFCs very well, and I would have loved to see an RFC for that design change of the
The impoliteness is more about publishing The name of the crate
Could
The improvement would be that the caller could also pass something similar to the current API in
Would you mind if we removed the bogus names that aren't part of the spec, @lifthrasiir? |
According to lifthrasiir/rust-encoding#92 (comment), it doesn't seem like all these changes couldn't be incorporated into |
If you use encoding_rs_compat, you get the performance benefits (for decode; performance regression for encode) of the encoding_rs internals but you don't get the API benefits. For non-streaming, you don't get borrows. For streaming, there's an extra copy if you implement
Gecko rather fundamentally has FFI concerns. It seemed inappropriate to try to push those concerns onto a library that puts the emphasis on being a Rust-only library. But I don't think it follows that Gecko shouldn't be able to address Gecko-relevant concerns. As for publishing
What's the concern? The name in The purpose of
AFAICT, there are three sources of virtual calls in
|
My point was that it seems weird to publish there a crate with a name very similar to another crate that does pretty much the same thing differently, namely
I see.
If some third-party crate A ends up using I'm more and more of the opinion that we should remove support for the both of them and publish rust-url 2.0.0 with a rust-url-encoding crate.
That it is extensible doesn't mean it must be done through virtual calls. What could go wrong about still having a @lifthrasiir would you be against making the set of encodings unextensible, otherwise?
Again, these could be plain old traits not used as trait objects, and a caller-allocated buffer could implement them, no? |
Note that if we merge that PR, and then later release |
The obvious good name for a crate that implements the Encoding Standard is
My apologies.
One option is not merging this PR and relying on
Maybe I don't know the details of type erasure well enough, but I thought that if you want third parties to see
Traits vs. trait objects aside, the semantics of Another use of a streaming API is gathering the output to a growable list of fixed-size buffers. This is the HTML parser case in Gecko. With a |
No need for apologies, I should have enclosed all that part by a disclaimer saying this is just lamentations from someone who cares a bit too much about things. In the end, if we manage to just move your changes to a new version of ACK the rest of your reply. |
☔ The latest upstream changes (presumably 3e5541e) made this pull request unmergeable. Please resolve the merge conflicts. |
My current plan is to deprecate the current rust-url support (but not remove until we have stronger reasons to do rust-url 2.0), and instead add a mechanism where users of the url crate can implement a trait to provide non-UTF-8 encodings support. It’s such a niche feature that the loss of convenience isn’t a problem. This plan makes this PR obsolete, so don’t spend time rebasing it :) Thanks anyway for submitting it! Later, (when other Servo dependencies have gotten a similar treatment,) I’ll switch Servo entirely from rust-encoding to encoding_rs. |
I'm closing this as it seems like we're taking a different path:
Thank you! |
This is an implementation of the design discussed f2f with @SimonSapin and briefly with @Manishearth: Introducing optional support for encoding_rs as the query encoding conversion back end behind a new feature flag (
query_encoding_rs
).Observable changes:
Encoding
type means in encoding_rs.)_charset_
parameter, the name of the encoding is the canonical name per current spec, so the name can now contain upper-case letters.to_output_encoding()
returns UTF-8 for the replacement encoding.EncodingRef
in the outer API changes from&'static encoding::types::Encoding
to&'static encoding_rs::Encoding
.encoding_rs
.This change is